Improving translation memory fuzzy matching by paraphrasing
نویسنده
چکیده
Computer-assisted translation (CAT) tools have become the major language technology to support and facilitate the translation process. Those kind of programs store previously translated source texts and their equivalent target texts in a database and retrieve related segments during the translation of new texts. However, most of them are based on string or word edit distance, not allowing retrieving of matches that are similar. In this paper we present an innovative approach to match sentences having different words but the same meaning. We use NooJ to create paraphrases of Support Verb Constructions (SVC) of all source translation units to expand the fuzzy matching capabilities when searching in the translation memory (TM). Our first results for the EN-IT language pair show consistent and significant improvements in matching over state-of-the-art CAT systems, across different text domains.
منابع مشابه
Can Translation Memories afford not to use paraphrasing?
This paper investigates to what extent the use of paraphrasing in translation memory (TM) matching and retrieval is useful for human translators. Current translation memories lack semantic knowledge like paraphrasing in matching and retrieval. Due to this, paraphrased segments are often not retrieved. Lack of semantic knowledge also results in inappropriate ranking of the retrieved segments. Gu...
متن کاملIncorporating Paraphrasing in Translation Memory Matching and Retrieval
Current Translation Memory (TM) systems work at the surface level and lack semantic knowledge while matching. This paper presents an approach to incorporating semantic knowledge in the form of paraphrasing in matching and retrieval. Most of the TMs use Levenshtein editdistance or some variation of it. Generating additional segments based on the paraphrases available in a segment results in expo...
متن کاملImproving fuzzy matching through syntactic knowledge
Fuzzy matching in translation memories (TM) is mostly string-based in current CAT tools. These tools look for TM sentences highly similar to an input sentence, using edit distance to detect the differences between sentences. Current CAT tools use limited or no linguistic knowledge in this procedure. In the recently started SCATE project, which aims at improving translators’ efficiency, we apply...
متن کاملAssessing linguistically aware fuzzy matching in translation memories
The concept of fuzzy matching in translation memories can take place using linguistically aware or unaware methods, or a combination of both. We designed a flexible and time-efficient framework which applies and combines linguistically unaware or aware metrics in the source and target language. We measure the correlation of fuzzy matching metric scores with the evaluation score of the suggested...
متن کاملSemantics-based pretranslation for SMT using fuzzy matches
Semantic knowledge has been adopted recently for SMT preprocessing, decoding and evaluation, in order to be able to compare sentences based on their meaning rather than on mere lexical and syntactic similarity. Little attention has been paid to semantic knowledge in the context of integrating fuzzy matches from a translation memory with SMT. We present work in progress which focuses on semantic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015